The detection and prevention of illegal fishing is critical to maintaining a healthy and functional ecosystem. Recent research on ship detection in satellite imagery has focused exclusively on performance improvements, disregarding detection efficiency. However, the speed and compute cost of vessel detection are essential for a timely intervention to prevent illegal fishing. Therefore, we investigated optimization methods that lower detection time and cost with minimal performance loss. We trained an object detection model based on a convolutional neural network (CNN) using a dataset of satellite images. Then, we designed two efficiency optimizations that can be applied to the base CNN or any other base model. The optimizations consist of a fast, cheap classification model and a statistical algorithm. The integration of the optimizations with the object detection model leads to a trade-off between speed and performance. We studied the trade-off using metrics that give different weight to execution time and performance. We show that by using a classification model the average precision of the detection model can be approximated to 99.5% in 44% of the time or to 92.7% in 25% of the time.
translated by 谷歌翻译
In this work, we study the problem of Embodied Referring Expression Grounding, where an agent needs to navigate in a previously unseen environment and localize a remote object described by a concise high-level natural language instruction. When facing such a situation, a human tends to imagine what the destination may look like and to explore the environment based on prior knowledge of the environmental layout, such as the fact that a bathroom is more likely to be found near a bedroom than a kitchen. We have designed an autonomous agent called Layout-aware Dreamer (LAD), including two novel modules, that is, the Layout Learner and the Goal Dreamer to mimic this cognitive decision process. The Layout Learner learns to infer the room category distribution of neighboring unexplored areas along the path for coarse layout estimation, which effectively introduces layout common sense of room-to-room transitions to our agent. To learn an effective exploration of the environment, the Goal Dreamer imagines the destination beforehand. Our agent achieves new state-of-the-art performance on the public leaderboard of the REVERIE dataset in challenging unseen test environments with improvement in navigation success (SR) by 4.02% and remote grounding success (RGS) by 3.43% compared to the previous state-of-the-art. The code is released at
translated by 谷歌翻译
Words of estimative probability (WEP) are expressions of a statement's plausibility (probably, maybe, likely, doubt, likely, unlikely, impossible...). Multiple surveys demonstrate the agreement of human evaluators when assigning numerical probability levels to WEP. For example, highly likely corresponds to a median chance of 0.90+-0.08 in Fagen-Ulmschneider (2015)'s survey. In this work, we measure the ability of neural language processing models to capture the consensual probability level associated to each WEP. Firstly, we use the UNLI dataset (Chen et al., 2020) which associates premises and hypotheses with their perceived joint probability p, to construct prompts, e.g. "[PREMISE]. [WEP], [HYPOTHESIS]." and assess whether language models can predict whether the WEP consensual probability level is close to p. Secondly, we construct a dataset of WEP-based probabilistic reasoning, to test whether language models can reason with WEP compositions. When prompted "[EVENTA] is likely. [EVENTB] is impossible.", a causal language model should not express that [EVENTA&B] is likely. We show that both tasks are unsolved by off-the-shelf English language models, but that fine-tuning leads to transferable improvement.
translated by 谷歌翻译
对话策略模块是任务完成对话系统的重要组成部分。最近,越来越多的兴趣集中在加强学习(RL)的对话政策上。其有利的绩效和明智的行动决策取决于对动作值的准确估计。高估问题是RL的一个众所周知的问题,因为其对最大动作值的估计大于地面真理,这导致了不稳定的学习过程和次优政策。这个问题不利于基于RL的对话政策学习。为了减轻此问题,本文提出了一个动态的部分平均估计器(DPAV),对地面真相最大动作值。 DPAV计算预测的最大动作值和最小动作值之间的部分平均值,其中权重动态自适应和问题依赖性。我们将DPAV纳入了对话策略,并将DPAV纳入了对话策略,并表明我们的方法可以在不同域的三个对话数据集中获得更好或可比较的结果,并具有较低的计算负载。另外,与其他方法相比,理论上还证明了收敛性并得出偏置的上限和下限。
translated by 谷歌翻译
translated by 谷歌翻译
translated by 谷歌翻译
translated by 谷歌翻译
translated by 谷歌翻译
translated by 谷歌翻译
识别人类行为基本上是一种时空推理问题,并且应该至少在某种程度上不变,不变于人类的外观和所涉及的物体。在这项工作中,这一假设的激励,我们采取了以物体为中心的行动认可方法。多个工程之前研究过这个设置,但它仍然不清楚(i)仔细制作的时空布局的方法如何识别人类行为,以及(ii)如何,以及何时,融合来自布局和外观的信息基于模型。本文的主要焦点是组成/几次射击动作识别,在那里我们倡导多主题的使用(已被证明是对空间推理的)在时空布局上,即对象边界框的配置。我们评估不同的方案,以将视频出现信息注入系统,并在背景混乱的动作识别上基准。在某种东西 - else和行动基因组数据集上,我们演示(i)如何扩展基于时空布局的动作识别的多针注意,(ii)如何通过与布局融合来提高基于外观的模型的性能 - 基于模型,(iii)即使在非成分背景 - 杂乱的视频数据集中,布局和基于外观的模型之间的融合也提高了性能。
translated by 谷歌翻译